Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study

Liu, Yi; Deng, Gelei; Xu, Zhengzi; Li, Yuekang; Zheng, Yaowen; Zhang, Ying; Zhao, Lida; Zhang, Tianwei; Wang, Kailong; Liu, Yang

Computer Science > Software Engineering

arXiv:2305.13860 (cs)

[Submitted on 23 May 2023 (v1), last revised 10 Mar 2024 (this version, v2)]

Title:Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study

Authors:Yi Liu, Gelei Deng, Zhengzi Xu, Yuekang Li, Yaowen Zheng, Ying Zhang, Lida Zhao, Tianwei Zhang, Kailong Wang, Yang Liu

View PDF

Abstract:Large Language Models (LLMs), like ChatGPT, have demonstrated vast potential but also introduce challenges related to content constraints and potential misuse. Our study investigates three key research questions: (1) the number of different prompt types that can jailbreak LLMs, (2) the effectiveness of jailbreak prompts in circumventing LLM constraints, and (3) the resilience of ChatGPT against these jailbreak prompts. Initially, we develop a classification model to analyze the distribution of existing prompts, identifying ten distinct patterns and three categories of jailbreak prompts. Subsequently, we assess the jailbreak capability of prompts with ChatGPT versions 3.5 and 4.0, utilizing a dataset of 3,120 jailbreak questions across eight prohibited scenarios. Finally, we evaluate the resistance of ChatGPT against jailbreak prompts, finding that the prompts can consistently evade the restrictions in 40 use-case scenarios. The study underscores the importance of prompt structures in jailbreaking LLMs and discusses the challenges of robust jailbreak prompt generation and prevention.

Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2305.13860 [cs.SE]
	(or arXiv:2305.13860v2 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2305.13860

Submission history

From: Yi Liu [view email]
[v1] Tue, 23 May 2023 09:33:38 UTC (460 KB)
[v2] Sun, 10 Mar 2024 13:58:08 UTC (460 KB)

Computer Science > Software Engineering

Title:Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators